Q-value Additional Updating Method for Reducing Learning Time

نویسنده

  • Yunsick Sung
چکیده

In this paper, an update method of Q-value is proposed to increase the learning rate of Q-learning. When Q-value of executed action is small, even if it is an optimal action, the learning becomes longer because the frequency to be executed again becomes lower. The proposed method increased the execution frequency of optimal action by forcefully increasing the Q-value through the Q-value update method. In the test, up to 60% higher goal achievement rate was shown at a maximum.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning in Continuous Time: Advantage Updating

A new algorithm for reinforcement learning, advantage updating, is described. Advantage updating is a direct learning technique; it does not require a model to be given or learned. It is incremental, requiring only a constant amount of calculation per time step, independent of the number of possible actions, possible outcomes from a given action, or number of states. Analysis and simulation ind...

متن کامل

Reinforcement Learning Applied to a Differential Game

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a p...

متن کامل

P14: Anxiety Control Using Q-Learning

Anxiety disorders are the most common reasons for referring to specialized clinics. If the response to stress changed, anxiety can be greatly controlled. The most obvious effect of stress occurs on circulatory system especially through sweating. the electrical conductivity of skin or in other words Galvanic Skin Response (GSR) which is dependent on stress level is used; beside this parameter pe...

متن کامل

Advantage Updating Applied to a Differrential Game

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual gradient form of advantage updating. The game is a Markov Decision Process (MDP) with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile a...

متن کامل

On-Line Learning of a Persian Spoken Dialogue System Using Real Training Data

The first spoken dialogue system developed for the Persian language is introduced. This is a ticket reservation system with Persian ASR and NLU modules. The focus of the paper is on learning the dialogue management module. In this work, real on-line training data are used during the learning process. For on-line learning, the effect of the variations of discount factor (g) on the learning speed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015